Feat/attafosu/sglang OpenAI api compatibility#156
Conversation
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com> Committer: attafosu <thomas.atta-fosu@intel.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new feature that expands the dataset management capabilities by adding a Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a new llama3_8b_sglang preset for the CNN/DailyMail dataset, enhancing compatibility with the SGLang API. It also refines the Harmonize transform by adding a mode parameter and corresponding logic to handle both 'harmony' and 'plain' tokenization, which is crucial for flexible prompt processing. Additionally, optional fields in OpenAI API types (refusal, usage, system_fingerprint) are now explicitly set with None as a default, improving clarity and consistency. The changes are well-structured and contribute positively to the codebase.
| stream: bool = True, | ||
| max_new_tokens: int = 128, | ||
| temperature: float = 0.0, | ||
| top_p: float = 1.0, | ||
| top_k: int = 1, |
There was a problem hiding this comment.
The parameters stream, temperature, top_p, and top_k are defined in the llama3_8b_sglang function signature but are not used within the function body. This indicates dead code, which can be misleading and suggests that these parameters might be intended for future use or were overlooked. If these parameters are not meant to be used, they should be removed. If they are intended to be used, their functionality should be implemented.
There was a problem hiding this comment.
Pull request overview
Adds an SGLang-focused CNN/DailyMail preset and improves OpenAI msgspec compatibility by making optional response fields truly optional during struct construction/decoding.
Changes:
- Make msgspec OpenAI response fields (
refusal,usage,system_fingerprint) default toNonefor better OpenAI-compat decoding/serialization. - Extend
Harmonizewith amodeoption (harmonyvsplain) and add a per-row guard to avoid overwriting pre-tokenized rows in fused row-processor pipelines. - Add a new
llama3_8b_sglangCNN/DailyMail preset that formats prompts and producesinput_tokensviaHarmonize(mode="plain").
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/inference_endpoint/openai/types.py |
Makes msgspec OpenAI response structs tolerant of omitted optional fields. |
src/inference_endpoint/dataset_manager/transforms.py |
Adds Harmonize plain-tokenization mode and overwrite-prevention in fused pipelines. |
src/inference_endpoint/dataset_manager/predefined/cnndailymail/presets.py |
Adds an SGLang-oriented Llama 3.1 8B preset producing input_tokens. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if self.tokenized_column in row and row[self.tokenized_column] is not None: | ||
| return row |
There was a problem hiding this comment.
The pre-tokenized guard treats any non-None value as “already tokenized”. In pandas rows, missing values are often NaN (which is not None), so this would incorrectly skip tokenization and leave NaN in input_tokens, likely breaking downstream code that expects a list of token IDs. Consider using an explicit null check that treats NaN as missing (e.g., via pd.isna) before returning early.
| self.mode = mode | ||
| if self.mode not in {"harmony", "plain"}: | ||
| raise ValueError(f"Invalid harmonize mode: {self.mode}") |
There was a problem hiding this comment.
Harmonize.__call__ still skips purely based on the presence of tokenized_column in df.columns, but process_row now skips only when the per-row value is non-null. This makes behavior differ depending on whether row processors are fused (or if fuse_row_processors=False is used). Consider aligning the dataframe-level skip logic with the row-level guard so the transform behaves consistently.
| Returns: | ||
| Row dictionary with the harmonized prompt added | ||
| """ | ||
| row[self.tokenized_column] = self.harmonizer(row[self.prompt_column]) | ||
| # Guard pre-tokenized rows: the SGLang adapter adds a default Harmonize | ||
| # (GPT-OSS tokenizer + harmony mode). When row processors are fused, the | ||
| # dataframe-level skip is bypassed, so without this guard, adapter | ||
| # Harmonize would overwrite input tokens. Alternative: remove Harmonize | ||
| # from the adapter transforms and require each SGLang preset to add its | ||
| # own Harmonize with the desired tokenizer/args. | ||
| if self.tokenized_column in row and row[self.tokenized_column] is not None: | ||
| return row | ||
| if self.mode == "plain": | ||
| tokens = self.harmonizer.to_tokens(row[self.prompt_column]) | ||
| row[self.tokenized_column] = tokens | ||
| else: | ||
| row[self.tokenized_column] = self.harmonizer(row[self.prompt_column]) |
There was a problem hiding this comment.
This change adds new Harmonize behavior (mode plus the overwrite-prevention guard when row processors are fused), but tests/unit/dataset_manager/test_transforms.py explicitly excludes Harmonize. Please add unit tests that cover (1) mode="plain" vs mode="harmony", and (2) fused pipelines where a second Harmonize should not overwrite existing input_tokens.
| @@ -109,5 +109,5 @@ class ChatCompletionResponse(msgspec.Struct, kw_only=True, omit_defaults=True): | |||
| created: int | |||
| model: str | |||
| choices: list[ChatCompletionChoice] | |||
| usage: CompletionUsage | None | |||
| system_fingerprint: str | None | |||
| usage: CompletionUsage | None = None | |||
| system_fingerprint: str | None = None | |||
There was a problem hiding this comment.
There are no unit tests covering the msgspec OpenAI types / msgspec adapter decode path. Since these fields now default to None to support responses that omit them, it would be good to add a test that decodes a minimal OpenAI-compatible response missing refusal, usage, and system_fingerprint and asserts decoding succeeds and fields are None.
| self.harmonizer = Harmonizer( | ||
| tokenizer_name=tokenizer_name, | ||
| encoding_name=encoding_name, |
There was a problem hiding this comment.
In mode="plain", process_row only calls self.harmonizer.to_tokens(...), but Harmonizer.__init__ still loads the Harmony encoding and constructs Harmony system content. That’s potentially expensive and unnecessary for plain tokenization. Consider a lightweight path for plain mode (e.g., defer encoding load until __call__ is used, or use the underlying tokenizer directly) to reduce init overhead.
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
* Handle case with string response Handles the case where the response is a single string, not a list - needed to handle AMD submission which wasn't calculating TPOT without the fix. --------- Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
ff66399 to
4dd91ee
Compare
What does this PR do?
Adds unified dataset preset featuring openai-compatible and native sglang api for cnn dailymail.
Type of change
Related issues
Testing
Checklist